IXIR: A statistical information distillation system
نویسندگان
چکیده
The task of information distillation is to extract snippets from massive multilingual audio and textual document sources that are relevant for a given templated query. We present an approach that focuses on the sentence extraction phase of the distillation process. It selects document sentences with respect to their relevance to a query via statistical classification with support vector machines. The distinguishing contribution of the approach is a novel method to generate classification features. The features are extracted from charts, compilations of elements from various annotation layers, such as word transcriptions, syntactic and semantic parses, and information extraction (IE) annotations. We describe a procedure for creating charts from documents and queries, while paying special attention to query slots (free-text descriptions of names, organizations, topic, events and so on, around which templates are centered), and suggest various types of classification features that can be extracted from these charts. While observing a 30% relative improvement due to non-lexical annotation layers, we perform a detailed analysis of the contributions of each of these layers to classification performance. 2009 Elsevier Ltd. All rights reserved.
منابع مشابه
Fault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملSystem Identification of a Steam Distillation Pilot- Scale Using Arx and Narx Approaches
This paper presents steam temperature models for steam distillation pilot-scale (SDPS) by comparing Pseudo Random Binary Sequence (PRBS) versus Multi-Sine (M-Sine) perturbation signal Both perturbation signals were applied to nonlinear steam distillation system to study the capability of these input signals in exciting nonlinearity of system dynamics. In this work, both linear and nonlinear ARX...
متن کاملCanonical Form and Separability of PPT States in C2 ⊗ CM ⊗ CN Composite Quantum Systems
Quantum entangled states have become one of the key resources in the rapidly expanding field of quantum information processing and computation [1, 2, 3, 4]. Nevertheless, the study of physical character and mathematical structure of the quantum entanglement is far from being satisfied. One even does not have a general criterion to judge if a quantum (mixed) state is entangled or not. For bipart...
متن کاملThe geometry of separation processes: A horse-carrot theorem for steady flow systems
– The horse-carrot theorem bounding the entropy production in processes with a fixed number of relaxations is extended to steady flow processes. The dissipation turns out to be related to a path of flows rather than states. The example of fractional distillation is presented and shows how null directions for the geometry turn out to be useful in the analysis. The implied distillation column des...
متن کاملModel Predictive Inferential Control of a Distillation Column
Typical production objectives in distillation process require the delivery of products whose compositions meet certain specifications. The distillation control system, therefore, must hold product compositions as near the set points as possible in faces of upset. In this project, inferential model predictive control, that utilizes an artificial neural network estimator and model predictive cont...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 23 شماره
صفحات -
تاریخ انتشار 2009